Visualization with ggplot2

(A few remarks and tips before the practical session)

R is the best technology for doing computational science

(Subjectively.)

ggplot2 is the most powerful visualization framework

(Objectively.)

ggplot2 is a core tidyverse package

“Grammar of Graphics”

A formal syntax and grammar for describing visualizations

What does this mean?

Let’s consider base R plotting


library(palmerpenguins)

glimpse(penguins[1:4, ])
Rows: 4
Columns: 8
$ species           <fct> Adelie, Adelie, Adelie, Adelie
$ island            <fct> Torgersen, Torgersen, Torgersen, Torgersen
$ bill_length_mm    <dbl> 39.1, 39.5, 40.3, NA
$ bill_depth_mm     <dbl> 18.7, 17.4, 18.0, NA
$ flipper_length_mm <int> 181, 186, 195, NA
$ body_mass_g       <int> 3750, 3800, 3250, NA
$ sex               <fct> male, female, female, NA
$ year              <int> 2007, 2007, 2007, 2007

A base R histogram

hist(penguins$body_mass_g)

A base R histogram

hist(penguins$body_mass_g, xlim = c(2000, 8000))

A base R histogram

hist(penguins$body_mass_g, xlim = c(2000, 8000), ylim = c(0, 100))

A base R histogram

hist(penguins$body_mass_g, xlim = c(2000, 8000), ylim = c(0, 100),
     main = "Histogram of body mass of penguins")

A base R histogram

hist(penguins$body_mass_g, xlim = c(2000, 8000), ylim = c(0, 100),
     main = "Histogram of body mass of penguins",
     xlab = "Body mass [grams]")

A base R histogram

hist(penguins$body_mass_g, xlim = c(2000, 8000), ylim = c(0, 100),
     main = "Histogram of body mass of penguins",
     xlab = "Body mass [grams]", ylab = "Count of individuals")

A base R histogram

hist(penguins$body_mass_g, xlim = c(2000, 8000), ylim = c(0, 100),
     main = "Histogram of body mass of penguins",
     xlab = "Body mass [grams]", ylab = "Count of individuals",
     col = "black", border = "white")

A base R histogram

Code
par(mfrow = c(1, 3))

species_1 <- filter(penguins, species ==  "Adelie")
hist(species_1$body_mass_g, xlim = c(2000, 8000), ylim = c(0, 100),
     main = "Species 'Adelie'",
     xlab = "Body mass [grams]", ylab = "Count of individuals",
     col = "darkgreen", border = "white")

species_2 <- filter(penguins, species ==  "Chinstrap")
hist(species_2$body_mass_g, xlim = c(2000, 8000), ylim = c(0, 100),
     main = "Species 'Chinstrap'",
     xlab = "Body mass [grams]", ylab = "Count of individuals",
     col = "darkblue", border = "white")

species_3 <- filter(penguins, species ==  "Gentoo")
hist(species_3$body_mass_g, xlim = c(2000, 8000), ylim = c(0, 100),
     main = "Species 'Gentoo'",
     xlab = "Body mass [grams]", ylab = "Count of individuals",
     col = "darkorange", border = "white")

Base R plots use a single function and specify (a huge) number of optional parameters which change their “aesthetic” properties and visual elements.

Creating a new figure (from the same data even) requires us to write usually a completely different code.

tidyverse provides “grammar for data manipulation”…

ggplot2 provides “grammar for visualizations”…

Layers in “Grammar of Graphics”

Layers in “Grammar of Graphics”

  • data — our data frame
  • aesthetics — “mapping” of columns to visual properties of a figure (x, or y axes, color, shape, etc.)
  • geoms — graphical elements to be plotted (histograms, points, lines, etc.).